CUDA Fortran for Scientists and Engineers by Massimiliano Fatica & Gregory Ruetsch

CUDA Fortran for Scientists and Engineers by Massimiliano Fatica & Gregory Ruetsch

Author:Massimiliano Fatica & Gregory Ruetsch
Language: eng
Format: epub
ISBN: 9780124169722
Publisher: Elsevier Inc.
Published: 2013-09-15T16:00:00+00:00


3.5.2 Instruction-level parallelism

We have already seen an example of instruction-level parallelism in this book. In the transpose example of Section 3.4, a shared-memory tile of was used in most of the kernels. But because the maximum number of threads per block is 512 on certain devices, it is not possible to launch a kernel with threads per block. Instead, we have to use a thread block with fewer threads and have each thread process multiple elements. In the transpose case, blocks of threads were launched, with each thread processing four elements.

For the example in this section, we can modify the copy kernel to take advantage of instruction-level parallelism as follows:



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.